Modular multiplication method with precomputation using one known operand

ABSTRACT

A modular multiplication method implemented in an electronic digital processing system takes advantage of the case where one of the operands W is known in advance or used multiple times with different second operands V to speed calculation. The operands V and W and the modulus M may be integers or polynomials over a variable X. A possible choice for the type of polynomials can be polynomials of the binary finite field GF(2 N ). Once operand W is loaded into a data storage location, a value P=└W·X n+δ /M┘ is pre-computed by the processing system. Then when a second operand V is loaded, the quotient q{circle around ( )} for the product V·W being reduced modulo M is quickly estimated, q{circle around ( )}=└V·P/X n+δ ┘, optionally randomized, q′=q{circle around ( )}−E, and can be used to obtain the remainder r′=V·W−q′·M, which is congruent to (V·W) mod M. A final reduction can be carried out, and the later steps repeated with other second operands V.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 11/556,894, filed on Nov. 6, 2006, which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

The present invention relates to methods or arrangements for processing numerical data by electrical programmable computers, digital processing systems, logic circuitry, or similar electronic hardware together with any associated software, and in particular relates to arithmetic processing and calculating methods involving finite field, residue or congruence operations, including modular multiplication operations upon integers or polynomials, especially methods based upon or derived from the Barrett reduction method.

BACKGROUND ART

Numerous cryptographic algorithms make use of large-integer multiplication (or exponentiation) and reduction of the product to a residue value that is congruent for a specified, modulus that is related to the cryptographic key. Some cryptographic algorithms, including the AES/Rijndael block cipher and also those based, on discrete logarithms and elliptic curves, perform arithmetic operations on polynomials in a finite field, such as the binary field GF(2^(n)), including multiplication (or exponentiation) and modular reduction operations on such polynomials.

Mathematical computations in cryptographic algorithms, especially those performed by hardware-implemented cryptographic systems (such as RISC-based smart cards), may be susceptible to various side-channel attacks, including power analysis and timing attacks. An attacker externally monitors aspects of the hardware that are accessible, such as current through chip pads or electromagnetic emissions from a chip, in order to obtain information about internal operations which may be subjected to various analyses in an effort to uncover the encrypt ion key. Therefore, it is important that computations be secured so that information about the key cannot be obtained.

Typically, secure microcontrollers for smart cards use various kinds of hardware-based countermeasures to thwart such attacks. While some software-level countermeasures introduced into a cryptographic algorithm itself might also be considered, it is very important that any such countermeasures not adversely affect the speed or accuracy of the underlying computations. Not all of the internal operations of a cryptographic algorithm are as readily adaptable so as to incorporate software countermeasures without appreciable slowing and without jeopardizing accuracy of a final result.

Arithmetic operations in particular, including modular multiplication, either upon integers or upon polynomials with integer coefficients, generally require a specific result from operating upon given operands. Any changes that would obtain an erroneous final result would clearly be unwelcome. At the same time, it is important that these computations be fast and accurate. Multiplication and reduction, whether operated upon large integers or upon polynomials in a finite field, is usually the most computationally intensive portion of a cryptographic algorithm. In electronic digital hardware, various computational methods have been developed for efficiently performing modular multiplication, including those based upon the Barrett reduction method.

One particular case that frequently occurs in cryptographic applications is where one of the operands of a modular multiplication (or exponentiation) operation is known in advance or used several times. It would be desirable to take advantage of such occurrences in order to speed up the computation.

SUMMARY DISCLOSURE

The present invention is a method implemented in an electronic digital processing system that performs fast modular multiplication computations upon integers or polynomials. In particular, a precomputation is carried out using one operand that is known in advance, in order that the modular reduction quotient to be quickly estimated for any given product involving that operand. For added cryptographic security, the estimated quotient so obtained can be optionally reduced by a random value. The reduced product will then be larger than or equal to, yet still congruent to, the exact residue value for the modular multiplication. In some cryptographic algorithms, it is possible to work with the larger randomized, but congruent, value without affecting the final result. In other algorithms, the exact residue value may need to be found using a few additional subtractions with the modulus, but the intermediate randomization is still useful in resisting cryptoanalytic attacks.

More specifically, where V and W are two operands of which W is known in advance, M is the modulus, and a product congruent to (V×W) mod M is to be found, the method precomputes P=└W×2^(n+δ))/M┘, where n is greater than the size of the larger of W and M. The choice of small integer increment δ depends upon the maximum size of the other operand (the one not known in advance), and determines the permissible rounding error obtained for the quotient estimation. For each modular multiplication involving the pre-known operand W, an estimated quotient q{circle around ( )}=└(V×P)/2^(n+δ)┘ is obtained. Then a remainder value r{circle around ( )}=V×W−q{circle around ( )}×M is calculated. If the estimated quotient is reduced by a random value, q′=q{circle around ( )}−E, then the randomized estimated quotient q′ is used to obtain a remainder value r′ congruent with the exact residue.

This sequence of steps can be carried out using either integer or polynomial operands. For polynomial operands over a variable X, when calculating both P and q{circle around ( )}, X^(n+δ) replaces 2^(n+δ). To clarify the differences between integers and polynomials: (1) For integers, if the maximum possible size of V>2^(n+φ), and if δ≧φ, then the result q{circle around ( )} is less than or equal to the actual quotient Q with a maximum error of 1, i.e., Q−1≦q{circle around ( )}≦Q. But if δ<φ, the result q{circle around ( )} is less than or equal to the actual quotient Q with a maximum error defined by Q−2^(φ−δ)≦q{circle around ( )}≦Q; (2) For polynomials, if the maximum size (i.e., degree) of polynomial V(x) is such that deg(V(x))<n+φ, and if δ≧φ−1, then the result q{circle around ( )} is equal to the actual quotient Q. But if δ<φ−1, the result q{circle around ( )} can be different from the actual quotient Q with a maximum error deg(Q−q{circle around ( )})≦φ−δ−2.

The computation method is easily implemented with processing hardware, or by executing an equivalent firmware or software program in a data processor or computer. Exemplary hardware used to execute the modular multiplication may include an arithmetic logic unit (ALU) with multiplication-accumulate (MAC) circuitry, which might be selectable to perform either natural or polynomial arithmetic, and which could, if desired, be dedicated to finite field operations. Such a computation unit, with memory access, operates under the control of an operation sequencer executing firmware to carry out the modular multiplication steps. A random number generator may be provided to inject a random value into an estimated quotient value used for the modular reduction of the product. It is also well within the level of skill of hardware system designers to implement the method entirely in hardware, using, for example, a field programmable gate array (FPGA) or application-specific integrated circuit (ASIC).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic plan view of exemplary processor hardware for use in executing the modular multiplication in accord with the present invention.

FIG. 2 is a flow diagram illustrating the general steps in the modular multiplication method of the present invention for integers.

FIG. 3 is a flow diagram illustrating the general steps in the modular multiplication method of the present invention for polynomials.

DETAILED DESCRIPTION

With reference to FIG. 1, computational processor hardware for executing modular multiplication in accord with the present invention may include an arithmetic-logic unit (ALU) 10, or similar computational circuitry containing a hardware multiplier, for executing numerical operations, including multiplication, upon the provided operands. The ALU 10 generally has access to memory (RAM) 12 and various working registers 14. An operation sequencer 16 comprises logic circuitry for controlling the ALU, including data transfers to and from the memory 12 and registers 14, in accord with firmware or software instructions for the set of operations used to carry out the modular multiplication. Operation sequencer 16 may access operation parameters in the form of pointers stored in registers 18 that enable the operation sequencer 16 to locate an operand within the RAM 12, as well as information such as the operand sizes, carry injection control information, the destination address of intermediate results, etc. The hardware may also include a pseudo-random number generator circuit 20 that performs calculations and outputs a random numerical value (interpreted as either an integer or a polynomial). This random generator 20 may be accessed by the ALU 10, as directed by the operation sequencer in accord with program instructions implementing the modular multiplication method of the present invention, in order to inject a randomized error quantity Rand into the quotient estimation, as described herein.

Modular multiplication of two operands, whether of integers or polynomials, typically consists in calculating a product of the two numbers, and then processing a modular reduction of the product. Modular reduction generally solves r≡X mod M≡X−└X/M┘ M, where r is the residue value to be found which is congruent to X for a modulus M, and the symbol └a┘ represents the floor function (the largest integer≦a) so that q=└X/M┘ corresponds to an integer division operation to find a quotient q. In the present case, the numerical value X, whether an integer or a polynomial, is the product of two operands, X=V×W, where the operands V and W are themselves either integers or polynomials. Thus, the residue r=V×W−q×M. Barrett's reduction method involves pre-calculating and storing a scaled estimate of the modulus' reciprocal, M⁻¹, and replacing the long division with multiplications and word shifts to obtain an estimated quotient q{circle around ( )}. Obtaining the estimated quotient q{circle around ( )} is much faster than calculating the true quotient. When the estimated quotient q{circle around ( )} is used in place of the true quotient, the resulting remainder r{circle around ( )} will be slightly larger than, but congruent with, the residue value r. The exact residue value r, if desired, can be obtained from the remainder r{circle around ( )} by a final strict reduction. The present invention modifies this approach still further when one of the operands is known in advance or is used many times in the execution of a given algorithm.

With reference to FIG. 2, in order to carry out a processor-implemented function R: =(V×W) mod M, on one or more operands V, where the other operand W is known in advance, begin by loading (step 30) the operand W that is known in advance, then pre-computing (step 32) a value P:=└(W×2^(n+δ))/M┘. This value P will be used for efficiently estimating quotient values needed to quickly reduce the products of W with one or more operands V. The integer n in the expression 2^(n+δ) is the size in bits of the larger of the known operand W and. the modulus M, so that W≦2^(n) and M≦2^(n). The choice of the integer δ depends upon the maximum possible size of the other operand V. If V<2^(n+φ), then we can choose δ≧φ and we will obtain a good estimated quotient as our estimate, that verifies Q−1≦q{circle around ( )}≦Q, where Q is the real quotient. Alternatively, we can choose δ<φ for a faster quotient estimation, but with a greater degree of rounding, so that the estimated quotient will differ from the exact quotient up to some maximum error determined by our choice of δ. The choice δ<φ may be made, for example, if a bigger error on the quotient is accepted, or if a randomization is applied. If δ<φ, the result is less than or equal to the real quotient with a error boundary Q−2^(φ−δ)≦q{circle around ( )}≦Q, where Q is the real quotient. If a randomization is applied with a maximum boundary, the error boundary may be equal or near the random boundary. If 0≦E<2^(s), where E is the random value, then we can take φ−δ=s, so δ=φ−s. As the values of δ are defined by inequalities, it is possible to round them to more practical values, if needed.

Next, we load (step 34) a first of the operands V for which we wish to calculate a modular product with pre-known operand W. The quotient is estimated (step 36) as q{circle around ( )}:=└(V×P)/2^(n+δ)┘. The estimated quotient q{circle around ( )} can be optional diminished (step 40) by a random value E generated (step 38) by a pseudo-random number generator circuit 20 (in FIG. 1), q′:=q{circle around ( )}−E. As an option, random value E may have a size of no more than a half-word so as to limit the potential error contributed by that random value E. Randomizing provides a layer of security against various cryptoanalytic attacks that rely upon consistency in power usage to determine the modulus M, which may be derived from or otherwise related to a cryptographic key. Introducing the random value E, causes the modular multiplication operation to differ from one execution to the next, while still producing a congruent result R′. Alternatively, we may keep near the quotient q by leaving the estimated quotient unchanged, q′:=q{circle around ( )}.

In either case, the quotient value q′ is used to compute a remainder R′ in the modular multiplication operation (step 44), where R′:=(V×W)−(q′×M). The remainder R′ will usually be larger than the modulus M, because the quotient value used q′ is not exactly equal to actual quotient q. Nevertheless R′ is congruent to the residue value for the modular multiplication. Depending on the needs of the particular application, the residue R can be calculated from the remainder R′ by applying subtractions (step 46) of the modulus M until the number is smaller than the modulus M. Then the residue value R can be returned (step 48), possibly together with the particular operand V, for use in the rest of the cryptographic system. Alternatively, if a final reduction to the residue is not required, the remainder R′ could be returned and used in the further calculations, since it is congruent modulo M with the residue value R.

Next, one can check (step 50) whether there are other operands V to be used in a modular multiplication with the same pre-known operand W. If so, the procedure may return (path 52) to step 34 and load the next operand V. If there are no additional operands V, the procedure may return to the main program.

With reference to FIG. 3, the modular multiplication operation may be adapted, for operation upon polynomial operands, e.g., in a binary finite field GF(2^(N)). Modular arithmetic with polynomials is similar in some respects to modular arithmetic with integers, although extending this to polynomials over a binary finite field GF(2^(N)) requires certain modifications to the basic operation. Let us first introduce polynomials over a field. To any multiple (a_(m−1), . . . a₁, a₀) of members of a field F, we can associate a polynomial in x of degree (m−1): a_(m−1)x^(m−1) + . . . a₁x¹+a₀x⁰. In the case of any binary finite field, the members of the field are {0,1} and so the polynomial coefficients a_(i) are likewise 0 or 1. This concept adapts particularly well, to computer hardware and other digital processing circuitry, which are binary in nature, since each bit can be interpreted as a finite field element. For example, we can associate each binary byte value [a₇ a₆ a₅ a₄ a₃ a₂ a₁ a₀] with a corresponding polynomial over GF(2^(N)) of degree 7 or less: a_(m)x⁷+a₆x⁶+a₅x⁵+a₄x⁴+a₃x³+a₂x₂+a₁x+a₀. Hence, e.g., the byte value [01100011] is interpreted as the binary polynomial x⁶+x⁵+x+1. Longer multi-byte sequences may likewise be interpreted as polynomials of higher degree, provided that the polynomial degree (m−1) is less than N in order for the polynomial to belong to the field GF(2^(N)). (Note: when comparing the relative sizes of polynomials, the comparison is performed degree by degree, starting with the polynomial coefficients for the largest degree in x.) Addition and subtraction of polynomials in a field are carried out in the usual manner of adding or subtracting the coefficients for each degree separately,

${{\sum\limits_{i}{a_{i}x^{i}}} \pm {\sum\limits_{i}{b_{i}x^{i}}}} = {\sum\limits_{i}{\left( {a_{i} \pm b_{i}} \right){x^{i}.}}}$

However, for any binary field, the members are {0,1}, so that addition and subtraction of the field elements are performed modulo 2 (0±0=0, 0±1=1, 1±0=1, 1±1=0). Note that, in this case, subtraction is identical to addition. In computer hardware, addition/subtraction modulo 2 is performed, with a logical XOR operation upon the array of bits. For example, (x⁶+x⁴+x²+x+1)+(x⁷+x+1)=(x⁷+x⁶+x⁴+x²); or in binary notation [01010111]⊕[10000011]=[11010100].

Polynomial multiplication is ordinarily defined (for infinite fields) by:

${{\left( {\sum\limits_{i}{a_{i}x^{i}}} \right) \cdot \left( {\sum\limits_{j}{b_{j}x^{j}}} \right)} = {\sum\limits_{k}{c_{k}x^{k}}}},$

where the coefficient c_(k) is given by the convolution:

$c_{k} = {\sum\limits_{{i + j} = k}{a_{i}{b_{j}.}}}$

(Again, in a binary field, the summation is performed modulo 2.) However, in a finite field, this definition must be modified in order to ensure that the product also belongs to the field. In particular, ordinary polynomial multiplication is followed by modular reduction by a modulus m(x) of degree n (where n is the dimension of the finite field, as in GF(2^(n)). The modulus m(x) is preferably chosen to be an irreducible polynomial (the polynomial analogue of a prime number, i.e. one that cannot be factored into nontrivial polynomials over the same field.) For example, in the AES/Rijndael symmetric block cipher, operations are performed on bytes (polynomials of degree 7 or less) in the binary finite field GF(2⁸), using the particular irreducible polynomial m(x)=x⁸+x⁴+x³+x+1 as the chosen basis for modular reduction when performing polynomial multiplication. As an example of polynomial multiplication in a binary finite field, using the particular m(x) specified for AES: (x⁶+x⁴+x²+x+1) ·(x⁷+x+1)=(x¹³+x¹¹+x⁹+x⁸+x⁶+x⁵+x⁴+x³+1), which after reduction, gives (x⁷+x⁶+1).

Let F[x] be the set of polynomials all of whose coefficients are members of a field F. If the modulus m(x) is a polynomial of degree d in F[x], then for polynomials p(x), r(x)εF[x], we say that p(x) is congruent to r(x) modulo m(x), written as p(x)≡r(x) (mod m(x)), if and only if m(x) divides the polynomial p(x)−r(x); in other words p(x)−r(x) is a polynomial multiple of m(x), that is, p(x)−r(x)=q(x)·m(x) for some polynomial q(x)εF[x]. Equivalently, p(x) and r(x) have the same remainder upon division by m(x). Modular reduction of a polynomial p(x), which could be an ordinary product of polynomials a(x) and b(x) in F[x], that is, p(x)=a(x)·b(x), involves finding a polynomial quotient q(x) such that the remainder or residue r(x) is a polynomial of degree less than m(x), that is, deg(r(x)) <d. The polynomial residue r (x), which is congruent with p(x), is the polynomial value we ultimately want. In the binary finite field GF(2^(n)), m(x) will be an irreducible polynomial of degree n and the residue polynomial r(x) that is sought will be of degree less than n; but p(x) and hence also q(x) can be any degree, and at least the polynomial p(x) to be reduced is often of degree larger than m, as for example when p(x) is a product. In any case, the basic problem in any modular reduction method is in efficiently obtaining a quotient, especially for polynomial p(x) and m(x) of large degree.

As shown in FIG. 3, a modular multiplication method in accord with the present invention, where one of the polynomial operands w(x) is known in advance, begins by loading (step 60) that known operand w(x), then pre-computing (step 62) a polynomial p(x):=[w(x)·x^(n+δ)]/m(x). The polynomial p(x) will be used to efficiently compute a polynomial quotient q(x) for all modular multiplication operations involving the operand w(x), The other operand v(x), not necessarily known in advance, is loaded (step 64) and the polynomial quotient q(x) associated with the product v(x)·w(x) is computed (step 66) as:

q(x):=v(x)·p(x)/x ^(n+δ).

The q(x) can be randomized (step 40) by subtracting a random polynomial value E (x), q′(x):=q(x)−E(x). The random polynomial value E(x) may be generated by any known random or pseudo-random number generator (hardware or software), where the binary value generated is interpreted as a polynomial in the manner already described above. As an option, the random polynomial value E(x) may be constrained to fall within some specified range, such as 0<deg(E(x))<w/2, where here w is the word size.

Next, the modular multiplication operation is carried out (step 44), producing a remainder r′(x):

r′(x):=(v(x)·x(x))−(q′(x)·m(x)).

This remainder r′(x) will be congruent modulo m(x) with the residue value r(x). Note that the choice of δ in the equations given above will determine whether the quotient is exact. If deg (v(x) )<n+φ, and δ≧φ−1, then the polynomial q(x) will equal the exact quotient, prior to any randomization. If δ<φ−1, then q(x) will differ from the exact quotient, but deg(r′(x))−deg(r(x)) will be less than a maximum limit defined by δ, deg (Q−q{circle around ( )})≦φ−δ−2, where Q is the real quotient. Depending upon the needs of the particular application, the residue polynomial r(x) can be calculated from the remainder r′(x) by applying ordinary GF(2^(N)) polynomial reduction with the modulus m(x) to obtain a polynomial smaller than m(x). The polynomial remainder r′(x) or the residue r(x) may be returned for further use by the application. If modular multiplication on another polynomial operand v(x) is to be carried out (step 80) using the same w (x), then the procedure goes back (path 82) to loading (step 64) the next v(x). 

1. A system comprising: a multiplier to perform multiplication operations; a storage device coupled to the multiplier to store multiple operands, V, and a known operand W to be multiplied, a modulus M, and intermediate results, including a value P pre-computed as a function of W and M and a maximum possible size of V; a controller to control the multiplier to perform multiple modular multiplication operations for multiple different values of v using the pre-computed value P where W is constant.
 2. The system of claim 1 wherein P=└(W·X^(n+δ))/m┘ for the operand W and a modulus M, where X is selected to represent either a numerical constant or a polynomial variable, n is an integer representing a size of the larger of W and M, and where δ is a selected constant greater than
 1. 3. The system of claim 2 where V<2^(n+φ), and the constant δ is chosen so that δ≧φ.
 4. The system of claim 3 wherein the multiplier is controlled to compute an estimated quotient, q{circle around ( )}, and a congruent remainder for modular operation (V·W) mod M for multiple values of V.
 5. The system of claim 4 wherein the estimated quotient q{circle around ( )}=└(V·P)/X^(n+δ)┘.
 6. The system of claim 4 and further comprising a random number generator to generate a random numerical value E to apply to the estimated quotient.
 7. The system of claim 1 wherein the operands, V and w, are integers.
 8. The system of claim 1 wherein the operands, V and W, are polynomials.
 9. A system comprising: a multiplier to perform multiplication operations; a storage device coupled to the multiplier to store multiple operands, V, and a known operand W to be multiplied, a modulus M, and intermediate results, including a value P pre-computed as a function of W and M and a maximum possible size of V; a controller to control the multiplier to perform multiple modular multiplication operations for multiple different values of V using the pre-computed value P where V is constant.
 10. The system of claim 9 wherein P=└(W·X^(n+δ))/M┘ for the operand W and a modulus M, where X is selected to represent either a numerical constant or a polynomial variable, n is an integer representing a size of the larger of W and M, and where δ is a selected constant greater than
 1. 11. The system of claim 10 where V<2^(n+φ), and the constant δ is chosen so that δ≧φ.
 12. The system of claim 11 wherein the multiplier is controlled to compute an estimated quotient, q{circle around ( )}, and a congruent, remainder for modular operation (V·W) mod M for multiple values of V.
 13. The system of claim 12 wherein the estimated quotient q{circle around ( )}=└(V·P)/X^(n+δ)┘.
 14. The system of claim 12 and further comprising a random number generator to generate a random numerical value E to apply to the estimated quotient.
 15. The system of claim 9 wherein the operands, V and W, are integers.
 16. The system, of claim 9 wherein the operands, V and W, are polynomials.
 17. A method comprising: loading a first operand W into data storage accessible to a processor unit, wherein W is a first operand to be multiplied by a second, operand; pre-computing, using the processor unit, and storing a value P, where P is a function of the operand w and a modulus M; loading a second operand V into the data storage, wherein V is a second operand to be multiplied by W; computing, using the processor unit, an estimated quotient q{circle around ( )} for the product (V·W) to be reduced modulo M using P and V; calculating a remainder r′ such that it is congruent to (V·W) mod M; and repeating the loading of operand V, computing, and calculating for multiple values of V.
 18. The method of claim 17 wherein P=└(W·X^(n+δ))/M┘ for the operand W and a modulus M, where X is selected to represent either a numerical constant or a polynomial variable, n is an integer representing a size of the larger of W and M, where δ is a selected constant greater than 1, and where V<2^(n+φ), and the constant δ is chosen so that δ≧φ.
 19. The method of claim 18 wherein the estimated quotient q{circle around ( )}=└(V·P)/X^(n+δ)┘.
 20. The method of claim 17 and further comprising generating a random numerical value E to apply to the estimated quotient q{circle around ( )}. 